38 research outputs found

    g-FSG Approach for Finding Frequent Sub Graph

    Get PDF
    Informally, a graph is set of nodes, pairs of which might be connected by edges. In a wide array of disciplines, data can be intuitively cast into this format. For example, computer networks consist of routers/computers (nodes) and the links (edges) between them. Social networks consist of individuals and their interconnections (which could be business relationships or kinship or trust, etc.) Protein interaction networks link proteins which must work together to perform some particular biological function. Ecological food webs link species with predator-prey relationships. In these and many other fields, graphs are seemingly ubiquitous. The problems of detecting abnormalities (outliers) in a given graph and of generating synthetic but realistic graphs have received considerable attention recently. Both are tightly coupled to the problem of finding the distinguishing characteristics of real-world graphs, that is, the patterns that show up frequently in such graphs and can thus be considered as marks of realism. A good generator will create graphs which match these patterns. In this paper we present gFSG, a computationally efficient algorithm for finding frequent patterns corresponding to geometric sub graphs in a large collection of geometric graphs. gFSG is able to discover geometric sub graphs that can be rotation, scaling, and translation invariant, and it can accommodate inherent errors on the coordinates of the vertices

    A Novel Soft Computing Based Model For Symptom Analysis & Disease Classification

    Get PDF
    In countries like India, many mortality occurs every year because of improper pronouncement of disease on time. Many people remain deprived of medication as the people per doctor ratio are nearly 1:1700. Every human body and its physiological processes show some symptoms of a diseased condition. The proposed model in this paper would analyze those symptoms for identification of the disease and its type. In this proposed model, few selected attributes would be considered which are shown as symptoms by a person suspected with a particular disease. Those attributes can be taken as input for the proposed symptom analysis and classification model, which is a soft computing model for classifying a sample first to be diseased or disease free and then, if diseased, predicting its type (if any). Number of diseased and disease free samples are to be collected. Each of these samples is a collection of attributes shown / expressed by a human body. With respect to a specific disease, those collected samples form two primary clusters, one is diseased and the other one is disease free. The disease free cluster may be discarded for further analysis. Depending on the symptoms shown by the diseased samples, every disease has some types based on the symptoms it shows. The diseased cluster of samples can reform clusters among themselves depending on the types of the disease. Those clusters then become the classes of the multiclass classifier for analysis of a new incoming sample

    Predictive Data Mining: Promising Future and Applications

    Get PDF
    Predictive analytics is the branch of data mining concerned with the prediction of future probabilities and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other entity to predict future behavior. For example, an insurance company is likely to take into account potential driving safety predictors such as age, gender, and driving record when issuing car insurance policies. Multiple predictors are combined into a predictive model, which, when subjected to analysis, can be used to forecast future probabilities with an acceptable level of reliability. In predictive modeling, data is collected, a statistical model is formulated, predictions are made and the model is validated (or revised) as additional data becomes available. Predictive analytics are applied to many research areas, including meteorology, security, genetics, economics, and marketing. In this paper, we have done an extensive study on various predictive techniques with all its future directions and applications in various areas are being explaine

    Query Reformulation: Data Integration Approach to Multi Domain Query Answering System

    Get PDF
    Data integration gives the user with a unified view of all heterogeneous data sources. The basic service provided by data integration is query processing. Whatever query posed to the system is being given to global schema which has to reformulate to sub queries that are to be posed to the local sources. Reformulation is being accomplished by mapping between global and local sources by Global-as-View (GAV), Local-as-view (LAV) and Global-local-as-view (GLAV) approach. When a query involves multiple domains, it is difficult to extract information in case of general service engines

    An advance extended binomial GLMBoost ensemble method with synthetic minority over-sampling technique for handling imbalanced datasets

    Get PDF
    Classification is an important activity in a variety of domains. Class imbalance problem have reduced the performance of the traditional classification approaches. An imbalance problem arises when mismatched class distributions are discovered among the instances of class of classification datasets. An advance extended binomial GLMBoost (EBGLMBoost) coupled with synthetic minority over-sampling technique (SMOTE) technique is the proposed model in the study to manage imbalance issues. The SMOTE is used to solve the proposed model, ensuring that the target variable's distribution is balanced, whereas the GLMBoost ensemble techniques are built to deal with imbalanced datasets. For the entire experiment, twenty different datasets are used, and support vector machine (SVM), Nu-SVM, bagging, and AdaBoost classification algorithms are used to compare with the suggested method. The model's sensitivity, specificity, geometric mean (G-mean), precision, recall, and F-measure resulted in percentages for training and testing datasets are 99.37, 66.95, 80.81, 99.21, 99.37, 99.29 and 98.61, 54.78, 69.88, 98.77, 96.61, 98.68, respectively. With the help of the Wilcoxon test, it is determined that the proposed technique performed well on unbalanced data. Finally, the proposed solutions are capable of efficiently dealing with the problem of class imbalance

    A Software Defined Radio based UHF Digital Ground Receiver System for Flying Object using LabVIEW

    Get PDF
    This study demonstrates the design and implementation of a software defined radio based digital ground receiver system using LabVIEW. In flight testing centre, command transmission system is used to transmit specific commands to execute some operation inside the flight vehicle. One ground receiver system is needed to monitor the transmitted command and monitor the presence of the command in air. The newly implemented ground receiver system consists of FPGA, RTOS and general processing unit. The analog to digital conversion and RF down conversions are carried out in high speed PCI extension for instrumentation express cards. The communication algorithms, digital down conversion are implemented in FPGAs. The communication system uses digital demodulation and decoding scheme and realised by NI PXI-7966R with Xilinx Virtex 5, SXT, FPGA. The performance of the receiver system has been analysed by linearity measurement of pre-amplifier Gain, Noise figure, frequency, power and also measurement of sensitivity. The results show successful implementation of the ground receiver system

    Cat Swarm based Optimization of Gene Expression Data Classification

    Get PDF
    Abstract-An Artificial Neural Network (ANN) does have the capability to provide solutions of various complex problems. The generalization ability of ANN due to the massively parallel processing capability can be utilized to learn the patterns discovered in the data set which can be represented in terms of a set of rules. This rule can be used to find the solution to a classification problem. The learning ability of the ANN is degraded due to the high dimensionality of the datasets. Hence, to minimize this risk we have used Principal Component Analysis (PCA) and Factor Analysis (FA) which provides a feature reduced dataset to the Multi Layer Perceptron (MLP), the classifier used. Again, since the weight matrices are randomly initialized, hence, in this paper we have used Cat Swarm Optimization (CSO) method to update the weight values of the weight matrix. From the experimental evaluation, it was found that using CSO with the MLP classifier provides better classification accuracy as compared to when the classifier is solely used

    A Novel PSO-FLANN Framework of Feature Selection and Classification for Microarray Data

    Get PDF
    AbstractFeature selection is a method of finding appropriate features from a given dataset. Last few years a number of feature selection methods have been proposed for handling the curse of dimensionality with microarray data set. Proposed framework has used two feature selection methods: Principal component analysis (PCA) and Factor analysis (FA). Typically microarray data contains number of genes with huge number of conditions. In such case there is a need of good classifier to classify the data. In this paper, particle swarm optimization (PSO) is used for classification because the parameters of PSO can be optimized for a given problem. In recent years PSO has been used increasingly as a novel technique for solving complex problems. To classify the microarray dataset, the functional link artificial neural network (FLANN) used the PSO to tune the parameters of FLANN. This PSO-FLANN classifier has been used to classify three different microarray data sets to achieve the accuracy. The proposed PSO-FLANN model has also been compared with discriminant Analysis (DA). Experiments were performed on the three microarray datasets and the simulation shows that PSO-FLANN gives more than 80% accuracy

    Rough ACO: A Hybridized Model for Feature Selection in Gene Expression Data

    Get PDF
    Dimensionality reduction of a feature set is a common preprocessing step used for pattern recognition, classification applications and in compression schemes. Rough Set Theory is one of the popular methods used, and can be shown to be optimal using different optimality criteria. This paper proposes a novel method for dimensionality reduction of a feature set by choosing a subset of the original features that contains most of the essential information, using the same criteria as the ACO hybridized with Rough Set Theory. We call this method Rough ACO. The proposed method is successfully applied for choosing the best feature combinations and then applying the Upper and Lower Approximations to find the reduced set of features from a gene expression data
    corecore